Computer Vision - Transfer Learning and FCNNs¶

Lucas Kruitwagen¶

Smith School of Enterprise and the Environment

lucas.kruitwagen@gmail.com¶

@lucaskruitwagen¶

https://github.com/Lkruitwagen¶

Contents today:¶

  1. Advances in CNNs

  2. Transfer Learning

  3. Let's code! TF + transfer learning

  4. "Fully Convolutional" Neural Networks

  5. Let's code! TF + transfer learning + FCNNs

  6. Tutorial: Google Colab GPU-enhanced runtime

  7. Tutorial: Machine Learning Experiments

Advances in CNNs¶

Where we left off...

drawing

We saw that convolution with machine-learned filters was the leading method for solving computer vision classification problems.

drawing

AlexNet 2012

drawing

Contribution:

  • GPU training
  • Paralellised GPU training
  • ReLU Activation + dropout

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012): 1097-1105.

VGG-16, -19 2014

drawing

Contribution:

  • Acheiving deeper representations by stacking convolutional layers

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition

Inception (GoogLeNet) 2014

Inception cell Inception Architecture
drawing drawing

Contribution:

  • First paper to start explicitly using 'block'-based architectures.
  • 1x1 convolution an early precurser to grouped convolutions
  • pooled, 1x1, 3x3, 5x5 concatenated features very flexible to spatial-semantic tradeoffs.

Szegedy, C., Lui, W., Jia, Y., et al. (2014) Going Deeper with Convolutions.

ResNet 2015

ResNet cell ResNet Architecture
drawing drawing

Contribution:

  • Residual skip-connections solves the degradation problem of deeper networks
  • Popularised batch normalisation

He, K., Zhang, X., Ren, S., and Sun, J. (2015) Deep Residual Learning for Image Recognition.

Aside: What is batch normalisation?

Batch normalisation layers are put after convolutional layers to normalise each batch during training to a given mean and variance. The subsequent layer always receives, as input, data with the same $\mu$ and $\sigma$. This has the effect of:

  • stabilising training. A smaller batch size and/or larger learning rate can then be used.
  • regularisation. Because data extremes are muted between batches, the model is less overfitted.

ResNeXt 2017

drawing

Contribution:

  • Implemented "cardinal" scaling withing blocks (similar to "parallel towers" from Inception)
    • a classic machine learning concept whereby employing an ensemble leads to specialisation

Xie, S., Girshick, R., Dollár, P., Tu, Z., and He, K. (2017) Aggregated Residual Transformations for Deep Neural Networks.

DenseNet 2018

DenseNet cell DenseNet Architecture
drawing drawing

Contribution:

  • Concatenation of lower-level features, enabling a reduction in kernel dimension

Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2018) Densely Connected Convolutional Networks.

These networks represent a fairly stable paradigm. Iterations then turned to hyperparameter tuning:

AmoebaNet 2019

drawing

Contribution:

  • Use of reinforcement learning to discover an optimal CNN architecture

Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019) Regularized Evolution for Image Classifier Architecture Search.

EfficientNet 2020

drawing

Contribution:

  • Novel network discovery and uniform scaling factor for width, depth, and resolution

Tan, M., & Le, Q. V. (2020) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Performance comparison:

drawing

finally...

Transformers!

ViT 2020

drawing

Contribution:

  • Successfully applied transformers making use of attention to computer vision
  • Bigger is better. 300mn images pretraining, 632mn parameters

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

That's all well and good, but how can I use any of this with my own data? I only have 10 samples???

10 100?

100 1,000?

1,000 10,000?

Transfer Learning¶

With transfer learning, we pretrain a model in one domain, and then transfer the model's knowledge of that domain to another. Finetuning in the new domain leads to a model with improved generaliseability and performance.

drawing

drawing

We typically do this when we have a shortage of data in our target domain (e.g. labelled images of dog breeds), but a surplus of images in a similar domain (e.g. labelled images of animal types). We can transfer a model's learned knowledge of animal types to the problem of predicting dog breeds.

In practise, you will almost always start with a pretrained model - you wouldn't want to train VGG-19 yourself.

This means that you can begin many problems from the baseline of a model trained for many days by Microsoft, Google, or Oxford academics.

Very common practise with applications in computer vision and NLP.

There are many domain transfer and training tricks, but let's leave those for now.

Transfer Learning - how it works¶

  1. design our ML experiment to benefit from domain transfer. Design part of our model to benefit from learning a task from another domain.
    • e.g. I want to obtain bounding boxes in images for a small number of classes. To do this I can build a separate header on a pretrained classification model.
    • counter example: I want to detect anomalies in x-ray imagery. Pretraining on general-purpose imagery probably won't help on this problem.
  1. pretrain part of our model on a large data corpus. Can even swap headers.
    • e.g. pretrain a fully-connected classification header, then drop it and re-train a segmentation header.
    • In many cases, I don't need to pretrain the model myself, I can just load the weights from a pretrained model.
  1. Finetuning and retraining.
    • In some cases, we might have enough data to finetune our entire model (including the pretrained layers).
    • If we don't have enough data, we can freeze those layers, and just retrain a new header or whatever new layers we require for our use case

Let's Code! TF + Transfer Learning¶

Let's return to our Flowers problem. We had a dataset of 3,670 real pictures of flowers classified into one of five categories. With a basic AlexNet we quickly acheived a classifier accuracy of ~50%. Let's see if using transfer learning improves our outcome.

In [16]:
import os, sys, glob                  # some built-ins 
from random import shuffle            # shuffle a list of elements in-place

from PIL import Image                 # image manipulation
import requests                       # http requests
import matplotlib.pyplot as plt       # visualisation
import numpy as np                    # data maniputlations
from scipy.signal import convolve2d   # to demo convolution
from sklearn.metrics import confusion_matrix
from skimage.io import imread         # read an image to a np array
from skimage.transform import resize  # resize an image
from skimage.util import crop, pad    # crop or pad an image

import tensorflow as tf
import tensorflow_datasets as tfds    # built-in MNIST
In [2]:
tf.config.list_physical_devices()     # let's check whether TF is GPU-ready
Out[2]:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
In [17]:
### set a root directory for managing paths
root = os.path.abspath(os.path.join(os.getcwd(),'..'))
In [ ]:
### You can re-download the flowers data if you don't have it from the previous lecture
!wget -c https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz -O - | tar -xz -C {root}/data
In [4]:
### first, let's get all the image records
records = [{
    'flower':f.split('/')[-2],
    'path':f
} for f in glob.glob(os.path.join(root,'data','flower_photos','**','*.jpg'))]

print ('N records:',len(records))
print(records[0])
N records: 3670
{'flower': 'roses', 'path': '/home/jupyter/ox-sbs-ml-bd/data/flower_photos/roses/269037241_07fceff56a_m.jpg'}
In [5]:
shuffle(records)
In [6]:
### visualise our data
fig, axs = plt.subplots(4,6,figsize=(20,12))
axs=axs.flatten()
for ii_r,r in enumerate(records[0:24]):
    arr = imread(r['path'])
    axs[ii_r].imshow(arr)
    axs[ii_r].set_title(r['flower'])
plt.show()

We'll use our generator function again:

In [9]:
def flowers_generator(records, output_shape=(200,200), mode='random_crop'):
    ### a wrapper for our generator. Takes all our parameters and returns the generator.
    
    # one-hot encode our classes
    mapper = {'dandelion': 0, 'sunflowers': 1, 'daisy': 2, 'tulips': 3, 'roses': 4}

    def _generator():
        ### The internal generator must not take any parameters.

        for r in records:

            # io
            x = (imread(r['path'])).astype(np.float32)             # <- CHANGE HERE! Don't normalise.
            y = np.array(mapper[r['flower']]).astype(np.float32)

            # reduce dimension of array
            if mode=='resize':
                x = resize(x,output_shape)
            elif mode=='random_crop':
                crop_width = [(0,0)]*3
                pad_width  = [(0,0)]*3
                for ax in [0,1]:
                    if x.shape[ax]>output_shape[ax]:
                        crop_val=np.random.choice(x.shape[ax]-output_shape[ax])
                        crop_width[ax] = (crop_val, x.shape[ax]-output_shape[ax]-crop_val)
                    elif x.shape[ax]<output_shape[ax]:
                        pad_val = np.random.choice(output_shape[ax]-x.shape[ax])
                        pad_width[ax] = (pad_val,output_shape[ax]- x.shape[ax]-pad_val)
                        
                x = crop(x, crop_width)
                x = pad(x, pad_width)

            yield tf.convert_to_tensor(x), tf.convert_to_tensor(y)
            
    return _generator
In [10]:
trn_split=0.7
val_split=0.9
In [11]:
generator_obj_trn = flowers_generator(
    records[0:int(trn_split*len(records))], 
    output_shape=(200,200), 
    mode='resize'
)
generator_obj_val = flowers_generator(
    records[int(trn_split*len(records)):int(val_split*len(records))], 
    output_shape=(200,200), 
    mode='resize'
)
In [12]:
ds_flowers_trn = (
    tf.data.Dataset.from_generator(
     generator_obj_trn,
     output_signature=(
         tf.TensorSpec(shape=(200,200,3), dtype=tf.float32),
         tf.TensorSpec(shape=(), dtype=tf.float32)))
    ) \
    .cache().batch(128).prefetch(tf.data.experimental.AUTOTUNE)

ds_flowers_val = (
    tf.data.Dataset.from_generator(
     generator_obj_val,
     output_signature=(
         tf.TensorSpec(shape=(200,200,3), dtype=tf.float32),
         tf.TensorSpec(shape=(), dtype=tf.float32)))
    ) \
    .cache().batch(128).prefetch(tf.data.experimental.AUTOTUNE)

Both TF and PyTorch have built-in libraries for importing pretrained models. Let's look at the documentation for importing a pre-trained VGG-16 model.

model = tf.keras.applications.VGG16(
    include_top=True, 
    weights='imagenet', 
    input_tensor=None,
    input_shape=None, 
    pooling=None, 
    classes=1000,
    classifier_activation='softmax'
)

And we also have this warning:

Note: each Keras Application expects a specific kind of input preprocessing. For VGG16, call tf.keras.applications.vgg16.preprocess_input on your inputs before passing them to the model. vgg16.preprocess_input will convert the input images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.

Let's break down each of these options.

include top: like many CNNs, VGG-16 has several fully-connected layers after the convolutional layers. If we pass false to this parameter, the VGG model will be instantiated without the FC layers, which we might want for other downstream tasks.

weights: this option allows us to instantiate a VGG-16 model with pretrained weights. Exactly what we want for transfer learning!

input_tensor, input_shape: specify a different input shape for the model if not using the pretrained fully-connected layer.

pooling: if not using the top FC layers, we can specify a different pooling function on the convolutional output, if we want.

classes: if not using ImageNet weights, we can instantiate the the VGG network with a different number of output classes.

classifier_activiation: if we want to use a different activation function (e.g. sigmoid for multi-label classification), we can specify that here.

We want to use ImageNet-pretrained weights, but we only have 5 classes. We'll want to drop the top FC layers and use maxpooling to flatten the output. Then we'll add our own fully connected layers. We can also use an input size of (200,200,3) to match our previous generator (because we've removed the FC header).

Per the warning, we'll also need to normalise our data in the same way that VGG16 did. Fortunately keras has a built in preprocessor for this.

In [13]:
def vgg16_premapper(_x, _y):                     # sample and target are now tf tensors
    return tf.keras.applications.vgg16.preprocess_input(_x), _y      # return the (image, label) tuple
In [14]:
ds_flowers_trn = ds_flowers_trn.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE) 
ds_flowers_val = ds_flowers_val.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE) 

Let's check the shape of our data.

In [15]:
a,b = next(ds_flowers_trn.as_numpy_iterator())
In [16]:
a.shape, a.max(), a.min(), b.shape, b.max(), b.min()
Out[16]:
((128, 200, 200, 3), 151.061, -123.68, (128,), 4.0, 0.0)
In [17]:
model_vgg = tf.keras.applications.VGG16(
    include_top=False, 
    input_shape=(200,200,3),
    weights='imagenet', 
    pooling='max',
)
In [18]:
model_vgg.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 200, 200, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 200, 200, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 200, 200, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 100, 100, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 100, 100, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 100, 100, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 50, 50, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 50, 50, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 50, 50, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 50, 50, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 25, 25, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 25, 25, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 25, 25, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 25, 25, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 12, 12, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 6, 6, 512)         0         
_________________________________________________________________
global_max_pooling2d (Global (None, 512)               0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
In [19]:
# We don't have enough data to retrain all of VGG. Let's make our vgg model non-trainable.
model_vgg.trainable=False
In [20]:
# now make a new model using the nested VGG model
model = tf.keras.Sequential([
    model_vgg,
    tf.keras.layers.Dense(512),
    tf.keras.layers.Dropout(0.5), # Add a little bit of regularlisation
    tf.keras.layers.Dense(512),
    tf.keras.layers.Dropout(0.5), # Add a little bit of regularlisation
    tf.keras.layers.Dense(5),  
], name='flowers_vgg_classifier')
In [21]:
model.summary()
Model: "flowers_vgg_classifier"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, 512)               14714688  
_________________________________________________________________
dense (Dense)                (None, 512)               262656    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 2565      
=================================================================
Total params: 15,242,565
Trainable params: 527,877
Non-trainable params: 14,714,688
_________________________________________________________________
In [22]:
### or, if you want to see all the layers:
model = tf.keras.Sequential(
    [L for L in model_vgg.layers] + 
    [
        tf.keras.layers.Dense(512),
        tf.keras.layers.Dropout(0.5), # Add a little bit of regularlisation
        tf.keras.layers.Dense(512),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(5),  
    ],
    name='flowers_vggext_classifier'
)
In [23]:
# a bit more verbose
model.summary()
Model: "flowers_vggext_classifier"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 200, 200, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 200, 200, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 100, 100, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 100, 100, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 100, 100, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 50, 50, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 50, 50, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 50, 50, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 50, 50, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 25, 25, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 25, 25, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 25, 25, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 25, 25, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 12, 12, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 6, 6, 512)         0         
_________________________________________________________________
global_max_pooling2d (Global (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_3 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 5)                 2565      
=================================================================
Total params: 15,242,565
Trainable params: 527,877
Non-trainable params: 14,714,688
_________________________________________________________________
In [ ]:
## This pattern also works:
x = model_vgg.output
x = tf.keras.layers.Dense(512)(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(512)(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(5)(x)
model = tf.keras.Model(model_vgg.input, x)

See Keras documentation for more.

Let's train!

In [24]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),                          
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
In [25]:
model.fit(
    ds_flowers_trn,
    epochs=20,
    validation_data=ds_flowers_val,
)
Epoch 1/20
21/21 [==============================] - 168s 6s/step - loss: 56.8317 - sparse_categorical_accuracy: 0.4769 - val_loss: 16.5332 - val_sparse_categorical_accuracy: 0.8025
Epoch 2/20
21/21 [==============================] - 20s 932ms/step - loss: 18.6563 - sparse_categorical_accuracy: 0.7666 - val_loss: 16.4549 - val_sparse_categorical_accuracy: 0.7916
Epoch 3/20
21/21 [==============================] - 20s 932ms/step - loss: 18.6675 - sparse_categorical_accuracy: 0.7764 - val_loss: 12.4930 - val_sparse_categorical_accuracy: 0.8297
Epoch 4/20
21/21 [==============================] - 20s 928ms/step - loss: 13.8148 - sparse_categorical_accuracy: 0.7971 - val_loss: 13.8022 - val_sparse_categorical_accuracy: 0.8311
Epoch 5/20
21/21 [==============================] - 20s 936ms/step - loss: 14.7291 - sparse_categorical_accuracy: 0.8192 - val_loss: 14.2620 - val_sparse_categorical_accuracy: 0.8365
Epoch 6/20
21/21 [==============================] - 20s 931ms/step - loss: 10.2870 - sparse_categorical_accuracy: 0.8343 - val_loss: 12.0203 - val_sparse_categorical_accuracy: 0.8460
Epoch 7/20
21/21 [==============================] - 20s 936ms/step - loss: 6.7719 - sparse_categorical_accuracy: 0.8728 - val_loss: 11.9677 - val_sparse_categorical_accuracy: 0.8365
Epoch 8/20
21/21 [==============================] - 20s 929ms/step - loss: 8.3594 - sparse_categorical_accuracy: 0.8512 - val_loss: 11.9430 - val_sparse_categorical_accuracy: 0.8474
Epoch 9/20
21/21 [==============================] - 20s 935ms/step - loss: 8.1771 - sparse_categorical_accuracy: 0.8594 - val_loss: 13.8571 - val_sparse_categorical_accuracy: 0.8447
Epoch 10/20
21/21 [==============================] - 20s 929ms/step - loss: 8.1539 - sparse_categorical_accuracy: 0.8507 - val_loss: 14.9796 - val_sparse_categorical_accuracy: 0.8283
Epoch 11/20
21/21 [==============================] - 20s 934ms/step - loss: 6.9935 - sparse_categorical_accuracy: 0.8831 - val_loss: 14.3296 - val_sparse_categorical_accuracy: 0.8338
Epoch 12/20
21/21 [==============================] - 20s 930ms/step - loss: 6.3539 - sparse_categorical_accuracy: 0.8912 - val_loss: 18.9413 - val_sparse_categorical_accuracy: 0.8120
Epoch 13/20
21/21 [==============================] - 20s 934ms/step - loss: 7.4660 - sparse_categorical_accuracy: 0.8772 - val_loss: 12.7482 - val_sparse_categorical_accuracy: 0.8501
Epoch 14/20
21/21 [==============================] - 20s 935ms/step - loss: 5.9989 - sparse_categorical_accuracy: 0.8979 - val_loss: 14.7906 - val_sparse_categorical_accuracy: 0.8283
Epoch 15/20
21/21 [==============================] - 20s 935ms/step - loss: 6.3284 - sparse_categorical_accuracy: 0.8947 - val_loss: 12.0919 - val_sparse_categorical_accuracy: 0.8597
Epoch 16/20
21/21 [==============================] - 20s 934ms/step - loss: 5.1112 - sparse_categorical_accuracy: 0.9164 - val_loss: 12.6545 - val_sparse_categorical_accuracy: 0.8569
Epoch 17/20
21/21 [==============================] - 20s 934ms/step - loss: 4.6218 - sparse_categorical_accuracy: 0.9100 - val_loss: 15.3938 - val_sparse_categorical_accuracy: 0.8270
Epoch 18/20
21/21 [==============================] - 20s 931ms/step - loss: 3.9294 - sparse_categorical_accuracy: 0.9129 - val_loss: 13.7545 - val_sparse_categorical_accuracy: 0.8460
Epoch 19/20
21/21 [==============================] - 20s 934ms/step - loss: 4.8524 - sparse_categorical_accuracy: 0.9127 - val_loss: 13.9475 - val_sparse_categorical_accuracy: 0.8542
Epoch 20/20
21/21 [==============================] - 20s 931ms/step - loss: 4.6092 - sparse_categorical_accuracy: 0.9190 - val_loss: 13.7851 - val_sparse_categorical_accuracy: 0.8351
Out[25]:
<tensorflow.python.keras.callbacks.History at 0x7f9b3c586ad0>

85% accuracy after ~20 epochs. Not bad!

Fully Convolutional Neural Networks¶

drawing

What if we want to see which pixels correspond to each class? We want an end-to-end learning system for which the output has the same pixel dimensionality as the input. Why might this be interesting?

  • interpretability. We want to check the 'semantic' understanding of our model, and verify that it meets our expectations. Which pixels are leading to the label of 'cat'? This task is thus called 'semantic segmentation'
  • instance separation. Localising specific objects in the pixel space
  • physical interpretations, e.g. medical imagery, solar PV localision.

We want an architecture that gives a pixel-wise classification.

We want to do something to increase the dimension of our data. What can we do? Upsample and convolve!

drawing

aka: transposed convolution, fractionally-strided convolution, "deconvolution"

[image ref: Vincent Dumoulin, Francesco Visin]

As we do this we'll also probably want to decrease the channel dimension of our data, mirroring what we did during convolution.

Fully-Convolutional Architectures¶

Fully Convolutional Net 2014

drawing

contribution:

  • single transposed convolutional layer
  • concatenated and upsampled lower features

Shelhamer, E., Long, J., & Darrell, T. (2016) Fully Convolutional Networks for Semantic Segmentation.

U-Net 2015

drawing

contribution:

  • convolutional blocks after transposing
  • symmetrical encoding / decoding

Ronneberger, O., Philipp Fischer, P., & Thomas Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation.

More recent contributions overcome the fundamental spatial-semantic tradeoff

DeepLab 2017

Atruous Convolutions Atruous Spatial Pyramid Pooling
drawing drawing

contribution

  • use atruous convolution to traverse information from larger receptive field instead of pooling

Chen, L., George Papandreou, G., Iasonas Kokkinos, I., Murphy, K., & Yuille, A.L. (2017) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.

HRNet 2020

drawing

contribution:

  • train low-resolution features in parallel with high resolution features, exchanging information regularly

Wang, J., Sun, K., Cheng, T et al. (2020) Deep High-Resolution Representation Learning for Visual Recognition

Want more info, applications? A nice up-to-date blog post

Let's Code! TF + Transfer Learning + FCNNs¶

Now let's use transfer learning and transpose convolution to make an FCNN.

We'll need a new dataset, one that has segmentation labels. We also don't want one that's too big, so we can use it for this demo.

Let's use PASCAL VOC (Visual Object Classes). From Oxford, original dataset used for computer vision challenges 2005-2012.

Warning: untaring within a GCP notebook environment has unstable behaviour.

In [ ]:
### Let's download our data using wget same as we did for Flowers. This is a bigger dataset (~2gb) so be warned!
!wget -c http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar -O {root}/data/voc2012.tar
In [4]:
# in case robots.ox.ac.uk is broken I've mirrored it on GCP:
!wget -c https://storage.googleapis.com/voc-mirror/VOCtrainval_11-May-2012.tar -O {root}/data/voc2012.tar
--2021-06-07 15:14:33--  https://data.deepai.org/PascalVOC2012.zip
Resolving data.deepai.org (data.deepai.org)... 138.201.36.183
Connecting to data.deepai.org (data.deepai.org)|138.201.36.183|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3899239928 (3.6G) [application/x-zip-compressed]
Saving to: ‘/home/jupyter/ox-sbs-ml-bd/data/voc2012.zip’

/home/jupyter/ox-sb 100%[===================>]   3.63G  26.7MB/s    in 2m 23s  

2021-06-07 15:16:56 (26.0 MB/s) - ‘/home/jupyter/ox-sbs-ml-bd/data/voc2012.zip’ saved [3899239928/3899239928]

In [ ]:
### Now let's untar our data. (note: specify the directory with -C, and untar options with -x (extract),-v (verbose),-f (pass filename),-z (gzip))
!tar -C {root}/data -xvf {root}/data/voc2012.tar
In [ ]:
### Cleanup - remove our original .tar file.
!rm {root}/data/voc2012.tar

Alternative for unstable untar behaviour:

(use this if you are in a Jupyter Lab environment and untar is giving you memory spikes)

In [1]:
### use the google storage api to download the files directly.
from google.cloud import storage
In [2]:
storage_client = storage.Client()
bucket = storage_client.bucket('voc-mirror')
In [11]:
blobs=[b for b in bucket.list_blobs(prefix='VOCdevkit/')]
In [ ]:
for ii_b, b in enumerate(blobs):
    if ii_b %100==0:
        print (ii_b, ii_b/len(blobs))
    savepath = os.path.join(root,'data',b.name)
    if not os.path.exists(os.path.split(savepath)[0]):
        os.makedirs(os.path.split(savepath)[0])
    b.download_to_filename(savepath)

Step 1: set up our data records¶

VOC 2012 conveniently includes a list of unique ids with image and segmentation data. Let's read that list and then set up a list of dict with image and annotation paths. (Why a list? It's a basic Python Type and is easy to shuffle)

In [4]:
# read in unique idxs from 'trainval.txt'. We'll split training and validation later.
with open(os.path.join(root,'data','VOCdevkit','VOC2012','ImageSets','Segmentation','trainval.txt'),'r') as f:
    idxs = [line.strip() for line in f.readlines()]  # strip any line breaks, returns, white space
    
print (idxs[0], idxs[-1])
2007_000032 2011_003271
In [5]:
# make a records list of dicts
records = [
    {
        'image':os.path.join(root,'data','VOCdevkit','VOC2012','JPEGImages',idx+'.jpg'),
        'annotation':os.path.join(root,'data','VOCdevkit','VOC2012','SegmentationClass',idx+'.png'),
    }
    for idx in idxs
]

Step 2: Inspect our data¶

As usual, we want to inspect our data to make sure we know what it contains, how to open it, develop intuition about it, etc.

In [6]:
# randomly shuffle our records
shuffle(records)
In [7]:
### visualise our data
fig, axs = plt.subplots(4,6,figsize=(20,12))
axs=axs.flatten()
for ii_r,r in enumerate(records[0:12]):
    img = imread(r['image'])
    ann = imread(r['annotation'])
    axs[2*ii_r].imshow(img)
    axs[2*ii_r+1].imshow(ann)
plt.show()
In [8]:
imread(records[0]['annotation']).shape
Out[8]:
(500, 333, 4)

Ah. So all our annotations are in 4d (RGBA) pngs! That's annoying!

Step 3: Build our generators¶

As before, our generator will need to load our data and preprocess it. In this case, it will also need to load our annotations as images, and then convert them to a mask that we can use as a training target.

In [9]:
### We can conveniently find the colormap that VOC2012 uses to label its objects: https://albumentations.ai/docs/autoalbument/examples/pascal_voc/
VOC_COLORMAP = {
    "background":    [0, 0, 0],
    "aeroplane":    [128, 0, 0],
    "bicycle":    [0, 128, 0],
    "bird":    [128, 128, 0],
    "boat":    [0, 0, 128],
    "bottle":    [128, 0, 128],
    "bus":    [0, 128, 128],
    "car":    [128, 128, 128],
    "cat":    [64, 0, 0],
    "chair":    [192, 0, 0],
    "cow":    [64, 128, 0],
    "diningtable":    [192, 128, 0],
    "dog":    [64, 0, 128],
    "horse":    [192, 0, 128],
    "motorbike":    [64, 128, 128],
    "person":    [192, 128, 128],
    "potted plant":    [0, 64, 0],
    "sheep":    [128, 64, 0],
    "sofa":    [0, 192, 0],
    "train":    [128, 192, 0],
    "tv/monitor":    [0, 64, 128],
}
In [10]:
### let's use our colormap to make a mask-generating function
def get_mask(image):
    # image: an 3d (WxHxRGBA) numpy array of the annotation image
    
    height, width = image.shape[:2]
    segmentation_mask = np.zeros((height, width, len(VOC_COLORMAP.keys())), dtype=np.float32)
    for label_index, (key, rgb_value) in enumerate(VOC_COLORMAP.items()):
        segmentation_mask[:, :, label_index] = np.all(image == rgb_value, axis=-1).astype(float)
    return segmentation_mask
In [11]:
### let's inspect our masks to make sure they're generating properly
for ii in np.random.choice(len(records),3):
    fig, axs = plt.subplots(1,3,figsize=(9,3))
    img = imread(records[ii]['image'])
    ann = imread(records[ii]['annotation'])
    mask = get_mask(ann[:,:,0:3])   # need to drop the last channel (the alpha/transparency channel)
    axs[0].imshow(img)
    axs[1].imshow(ann)
    axs[2].imshow(mask.argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
    print (f'image {ii}:',[{f'class: {jj}',f'label: {list(VOC_COLORMAP.keys())[jj]}'} for jj in np.unique(mask.argmax(axis=-1))])
    plt.show()
image 358: [{'class: 0', 'label: background'}, {'label: sheep', 'class: 17'}]
image 2882: [{'class: 0', 'label: background'}, {'label: car', 'class: 7'}, {'class: 8', 'label: cat'}]
image 2584: [{'class: 0', 'label: background'}, {'class: 8', 'label: cat'}]
In [12]:
def voc2012_generator(records, output_shape=(200,200), mode='random_crop'):
    ### a wrapper for our generator. Takes all our parameters and returns the generator.

    def _generator():
        ### The internal generator must not take any parameters.

        for r in records:

            # io
            x = (imread(r['image'])).astype(np.float32)  # <- again, don't normalise.
            ann = imread(r['annotation'])[:,:,0:3]       # drop the alpha channel
            y = get_mask(ann)                            # WHC, float32

            # reduce dimension of array
            if mode=='resize':
                x = resize(x,output_shape)
                y = resize(y,output_shape)
            elif mode=='random_crop':
                crop_width = [(0,0)]*3
                pad_width  = [(0,0)]*3
                for ax in [0,1]:
                    if x.shape[ax]>output_shape[ax]:
                        crop_val=np.random.choice(x.shape[ax]-output_shape[ax])
                        crop_width[ax] = (crop_val, x.shape[ax]-output_shape[ax]-crop_val)
                    elif x.shape[ax]<output_shape[ax]:
                        pad_val = np.random.choice(output_shape[ax]-x.shape[ax])
                        pad_width[ax] = (pad_val,output_shape[ax]- x.shape[ax]-pad_val)
                        
                x = crop(x, crop_width)
                x = pad(x, pad_width)
                y = crop(y, crop_width)
                y = pad(y, pad_width)

            yield tf.convert_to_tensor(x), tf.convert_to_tensor(y)
            
    return _generator
In [13]:
trn_split=0.7
val_split=0.9
In [14]:
generator_obj_trn = voc2012_generator(
    records[0:int(trn_split*len(records))], 
    output_shape=(224,224), 
    mode='resize'
)
generator_obj_val = voc2012_generator(
    records[int(trn_split*len(records)):int(val_split*len(records))], 
    output_shape=(224,224), 
    mode='resize'
)
generator_obj_test = voc2012_generator(
    records[int(val_split*len(records)):], 
    output_shape=(224,224), 
    mode='resize'
)
In [15]:
ds_voc_trn = (
    tf.data.Dataset.from_generator(
     generator_obj_trn,
     output_signature=(
         tf.TensorSpec(shape=(224,224,3), dtype=tf.float32),
         tf.TensorSpec(shape=(224,224,21), dtype=tf.float32)))  # <- new shape!
    ) \
    .cache().batch(64).prefetch(tf.data.experimental.AUTOTUNE)

ds_voc_val = (
    tf.data.Dataset.from_generator(
     generator_obj_val,
     output_signature=(
         tf.TensorSpec(shape=(224,224,3), dtype=tf.float32),
         tf.TensorSpec(shape=(224,224,21), dtype=tf.float32)))  # <- new shape
    ) \
    .cache().batch(64).prefetch(tf.data.experimental.AUTOTUNE)

ds_voc_test = (
    tf.data.Dataset.from_generator(
     generator_obj_test,
     output_signature=(
         tf.TensorSpec(shape=(224,224,3), dtype=tf.float32),
         tf.TensorSpec(shape=(224,224,21), dtype=tf.float32)))  # <- new shape
    ) \
    .cache().batch(64).prefetch(tf.data.experimental.AUTOTUNE)
In [17]:
ds_voc_trn = ds_voc_trn.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE) 
ds_voc_val = ds_voc_val.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE) 
ds_voc_test = ds_voc_test.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE) 

Let's check the shape of our data to confirm we're getting what we expect.

In [18]:
a,b = next(ds_voc_trn.as_numpy_iterator())
In [19]:
a.shape, a.max(), a.min(), b.shape, b.max(), b.min()
Out[19]:
((64, 224, 224, 3), 151.061, -123.68, (64, 224, 224, 21), 1.0, 0.0)

Step 4: Build our model¶

Let's build a fully convolutional neural network for our semantic segmentation problem. We'll want a pretrained encoder and then we'll train a decoder using our new data. We'll add a header to do the final mapping to the output classes.

In [20]:
vgg_encoder = tf.keras.applications.VGG16(
    include_top=False, 
    input_shape=(224,224,3),
    weights='imagenet', 
    pooling=None,         # <- in this case, we don't want any pooling on our final outputs
)
In [21]:
# as before, let's make our encoder not trainable.
vgg_encoder.trainable = False
In [22]:
vgg_encoder.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
In [23]:
# make a function to build our upblocks flexible to the number of filters
def UpBlock(n_filters, n_blocks):
    block_layers = []
    
    for _ in range(n_blocks):
        block_layers.append(tf.keras.layers.Activation('relu'))
        block_layers.append(tf.keras.layers.Conv2DTranspose(n_filters, 3, padding="same"))
        block_layers.append(tf.keras.layers.BatchNormalization())
        
    # and add the upsampling layer
    block_layers.append(tf.keras.layers.UpSampling2D(2))
    
    return block_layers
In [24]:
decoder = tf.keras.models.Sequential(
    [tf.keras.layers.UpSampling2D(2, input_shape=(7,7,512))] 
    + UpBlock(n_filters=256,n_blocks=1)
    + UpBlock(128,1)
    + UpBlock(64,1)
    + UpBlock(32,1),
    name='decoder'
)
In [27]:
decoder.summary()
Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
up_sampling2d (UpSampling2D) (None, 14, 14, 512)       0         
_________________________________________________________________
activation (Activation)      (None, 14, 14, 512)       0         
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 14, 14, 256)       1179904   
_________________________________________________________________
batch_normalization (BatchNo (None, 14, 14, 256)       1024      
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 28, 28, 256)       0         
_________________________________________________________________
activation_1 (Activation)    (None, 28, 28, 256)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 28, 28, 128)       295040    
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 28, 128)       512       
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 56, 56, 128)       0         
_________________________________________________________________
activation_2 (Activation)    (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 56, 56, 64)        73792     
_________________________________________________________________
batch_normalization_2 (Batch (None, 56, 56, 64)        256       
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 112, 112, 64)      0         
_________________________________________________________________
activation_3 (Activation)    (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 112, 112, 32)      18464     
_________________________________________________________________
batch_normalization_3 (Batch (None, 112, 112, 32)      128       
_________________________________________________________________
up_sampling2d_4 (UpSampling2 (None, 224, 224, 32)      0         
=================================================================
Total params: 1,569,120
Trainable params: 1,568,160
Non-trainable params: 960
_________________________________________________________________
In [25]:
header = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, 3, padding="same", input_shape=(224,224,32)), 
    tf.keras.layers.Conv2D(21, 1, padding="same", activation='softmax')
], name='header')
In [26]:
header.summary()
Model: "header"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 224, 224, 32)      9248      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 224, 224, 21)      693       
=================================================================
Total params: 9,941
Trainable params: 9,941
Non-trainable params: 0
_________________________________________________________________
In [28]:
model = tf.keras.models.Sequential([
    vgg_encoder,
    decoder,
    header
])
In [29]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, 7, 7, 512)         14714688  
_________________________________________________________________
decoder (Sequential)         (None, 224, 224, 32)      1569120   
_________________________________________________________________
header (Sequential)          (None, 224, 224, 21)      9941      
=================================================================
Total params: 16,293,749
Trainable params: 1,578,101
Non-trainable params: 14,715,648
_________________________________________________________________

Step 5: Train!¶

In [30]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),                          
    loss=tf.keras.losses.CategoricalCrossentropy(), #<- in this case, classes are one-hot encoded
    metrics=[tf.keras.metrics.CategoricalAccuracy()],
)
In [31]:
model.fit(
    ds_voc_trn,
    epochs=5,
    validation_data=ds_voc_val,
)
Epoch 1/5
32/32 [==============================] - 861s 26s/step - loss: 1.6911 - categorical_accuracy: 0.6471 - val_loss: 14.7482 - val_categorical_accuracy: 0.1532
Epoch 2/5
32/32 [==============================] - 38s 1s/step - loss: 0.6837 - categorical_accuracy: 0.8272 - val_loss: 3.8629 - val_categorical_accuracy: 0.5857
Epoch 3/5
32/32 [==============================] - 39s 1s/step - loss: 0.5317 - categorical_accuracy: 0.8422 - val_loss: 1.5206 - val_categorical_accuracy: 0.7351
Epoch 4/5
32/32 [==============================] - 39s 1s/step - loss: 0.4423 - categorical_accuracy: 0.8584 - val_loss: 0.7096 - val_categorical_accuracy: 0.8336
Epoch 5/5
32/32 [==============================] - 39s 1s/step - loss: 0.3936 - categorical_accuracy: 0.8706 - val_loss: 0.6458 - val_categorical_accuracy: 0.8335
Out[31]:
<tensorflow.python.keras.callbacks.History at 0x7fae7c6fcdd0>

Step 6: Inspect and Analyse¶

In [32]:
X, Y = next(ds_voc_val.as_numpy_iterator())
In [33]:
Y_hat = model.predict(X)
In [34]:
### due to VGG preprocessing, we need to recover our initial image.
# https://stackoverflow.com/questions/55987302/reversing-the-image-preprocessing-of-vgg-in-keras-to-return-original-image
def deprocess_img(processed_img):
    x = processed_img.copy()
  
    # perform the inverse of the preprocessiing step
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    x = x[:, :, ::-1]

    x = np.clip(x, 0, 255).astype('uint8')
    return x
In [37]:
### let's inspect our masks to make sure they're generating properly
N = 5
fig, axs = plt.subplots(3,N,figsize=(N*2,N))
for ii, ii_r in enumerate(np.random.choice(X.shape[0],N)):
    axs[0,ii].imshow(deprocess_img(X[ii_r,:,:,:]))
    axs[1,ii].imshow(Y[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
    axs[2,ii].imshow(Y_hat[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
plt.show()

Okay so it looks like we're doing okay with the class identification, but not the shape.

Step 7: Iterate¶

Let's try some U-Net-like bridging connections from low-level encoder features.

In [38]:
def make_model():
    
    # don't need to get the last maxpool layer
    encoder_output = vgg_encoder.get_layer('block5_conv3').output
    
    # get the featuremaps from the encoder so we can bridge them to the decoder
    block1_featuremap = vgg_encoder.get_layer('block1_conv2').output  # 224x224
    block2_featuremap = vgg_encoder.get_layer('block2_conv2').output  # 112x112
    
    # VGG doesn't have batch norm so let's do it ourselves:
    block1_featuremap = tf.keras.layers.Activation('relu')(block1_featuremap)
    block1_featuremap = tf.keras.layers.BatchNormalization()(block1_featuremap)
    block2_featuremap = tf.keras.layers.Activation('relu')(block2_featuremap)
    block2_featuremap = tf.keras.layers.BatchNormalization()(block2_featuremap)
    
    # the decoder with bridging:
    x = tf.keras.models.Sequential(UpBlock(128,2))(encoder_output)      # 28x28
    x = tf.keras.models.Sequential(UpBlock(64,2))(x)                   # 56x56
    x = tf.keras.models.Sequential(UpBlock(32,2))(x)                    # 112x112
    x = tf.keras.layers.Concatenate()([block2_featuremap,x])          # 112x112
    x = tf.keras.models.Sequential(UpBlock(32,2))(x)                    # 224x224
    x = tf.keras.layers.Concatenate()([block1_featuremap,x])          # 224x224
    
    # add some header layers:
    x = tf.keras.layers.Activation('relu')(x)
    x = tf.keras.layers.Conv2D(32, 3, padding="same")(x)
    output = tf.keras.layers.Conv2D(21, 1, padding="same", activation='softmax')(x)

    return tf.keras.models.Model(vgg_encoder.input, output)
In [39]:
model = make_model()
In [40]:
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 224, 224, 64) 1792        input_1[0][0]                    
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, 224, 224, 64) 36928       block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_pool (MaxPooling2D)      (None, 112, 112, 64) 0           block1_conv2[0][0]               
__________________________________________________________________________________________________
block2_conv1 (Conv2D)           (None, 112, 112, 128 73856       block1_pool[0][0]                
__________________________________________________________________________________________________
block2_conv2 (Conv2D)           (None, 112, 112, 128 147584      block2_conv1[0][0]               
__________________________________________________________________________________________________
block2_pool (MaxPooling2D)      (None, 56, 56, 128)  0           block2_conv2[0][0]               
__________________________________________________________________________________________________
block3_conv1 (Conv2D)           (None, 56, 56, 256)  295168      block2_pool[0][0]                
__________________________________________________________________________________________________
block3_conv2 (Conv2D)           (None, 56, 56, 256)  590080      block3_conv1[0][0]               
__________________________________________________________________________________________________
block3_conv3 (Conv2D)           (None, 56, 56, 256)  590080      block3_conv2[0][0]               
__________________________________________________________________________________________________
block3_pool (MaxPooling2D)      (None, 28, 28, 256)  0           block3_conv3[0][0]               
__________________________________________________________________________________________________
block4_conv1 (Conv2D)           (None, 28, 28, 512)  1180160     block3_pool[0][0]                
__________________________________________________________________________________________________
block4_conv2 (Conv2D)           (None, 28, 28, 512)  2359808     block4_conv1[0][0]               
__________________________________________________________________________________________________
block4_conv3 (Conv2D)           (None, 28, 28, 512)  2359808     block4_conv2[0][0]               
__________________________________________________________________________________________________
block4_pool (MaxPooling2D)      (None, 14, 14, 512)  0           block4_conv3[0][0]               
__________________________________________________________________________________________________
block5_conv1 (Conv2D)           (None, 14, 14, 512)  2359808     block4_pool[0][0]                
__________________________________________________________________________________________________
block5_conv2 (Conv2D)           (None, 14, 14, 512)  2359808     block5_conv1[0][0]               
__________________________________________________________________________________________________
block5_conv3 (Conv2D)           (None, 14, 14, 512)  2359808     block5_conv2[0][0]               
__________________________________________________________________________________________________
sequential_1 (Sequential)       (None, 28, 28, 128)  738560      block5_conv3[0][0]               
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 112, 112, 128 0           block2_conv2[0][0]               
__________________________________________________________________________________________________
sequential_2 (Sequential)       (None, 56, 56, 64)   111232      sequential_1[0][0]               
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 112, 112, 128 512         activation_5[0][0]               
__________________________________________________________________________________________________
sequential_3 (Sequential)       (None, 112, 112, 32) 27968       sequential_2[0][0]               
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 224, 224, 64) 0           block1_conv2[0][0]               
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 112, 112, 160 0           batch_normalization_5[0][0]      
                                                                 sequential_3[0][0]               
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 224, 224, 64) 256         activation_4[0][0]               
__________________________________________________________________________________________________
sequential_4 (Sequential)       (None, 224, 224, 32) 55616       concatenate[0][0]                
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 224, 224, 96) 0           batch_normalization_4[0][0]      
                                                                 sequential_4[0][0]               
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 224, 224, 96) 0           concatenate_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 224, 224, 32) 27680       activation_14[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 224, 224, 21) 693         conv2d_2[0][0]                   
==================================================================================================
Total params: 15,677,205
Trainable params: 961,109
Non-trainable params: 14,716,096
__________________________________________________________________________________________________
In [41]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),                          
    loss=tf.keras.losses.CategoricalCrossentropy(), #<- in this case, classes are one-hot encoded
    metrics=[tf.keras.metrics.CategoricalAccuracy()],
)
In [42]:
model.fit(
    ds_voc_trn,
    epochs=5,
    validation_data=ds_voc_val,
)
Epoch 1/5
32/32 [==============================] - 63s 2s/step - loss: 1.6185 - categorical_accuracy: 0.6113 - val_loss: 3.9719 - val_categorical_accuracy: 0.4744
Epoch 2/5
32/32 [==============================] - 54s 2s/step - loss: 0.7886 - categorical_accuracy: 0.7705 - val_loss: 1.6923 - val_categorical_accuracy: 0.6200
Epoch 3/5
32/32 [==============================] - 54s 2s/step - loss: 0.6383 - categorical_accuracy: 0.7999 - val_loss: 1.1082 - val_categorical_accuracy: 0.7022
Epoch 4/5
32/32 [==============================] - 54s 2s/step - loss: 0.5530 - categorical_accuracy: 0.8230 - val_loss: 0.9435 - val_categorical_accuracy: 0.7439
Epoch 5/5
32/32 [==============================] - 54s 2s/step - loss: 0.5729 - categorical_accuracy: 0.8243 - val_loss: 0.8681 - val_categorical_accuracy: 0.7499
Out[42]:
<tensorflow.python.keras.callbacks.History at 0x7fae7fc3a550>
In [43]:
X, Y = next(ds_voc_val.as_numpy_iterator())
In [44]:
Y_hat = model.predict(X)
In [46]:
### let's inspect our masks to see how they're doing now
N = 5
fig, axs = plt.subplots(3,N,figsize=(N*2,N))
for ii, ii_r in enumerate(np.random.choice(X.shape[0],N, replace=False)):
    axs[0,ii].imshow(deprocess_img(X[ii_r,:,:,:]))
    axs[1,ii].imshow(Y[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
    axs[2,ii].imshow(Y_hat[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
    for jj in range(3):
        axs[jj,ii].axis('off')
plt.show()

Step 8: Evaluate¶

In [65]:
### Let's get the confusion matrix for the test set
cs = []
for X, Y in ds_voc_test:
    Y_hat = model.predict(X)
    C = confusion_matrix(Y.numpy().argmax(axis=-1).flatten(), Y_hat.argmax(axis=-1).flatten())
    cs.append(C)
In [66]:
confusion_arr = np.array(cs).sum(axis=0)
In [67]:
fig, ax = plt.subplots(1,1, figsize=(12,12))
vmax = confusion_arr[1:,1:].max()
ax.imshow(confusion_arr, vmax=vmax)
ax.set_yticks(range(len(VOC_COLORMAP.keys())))
ax.set_yticklabels(list(VOC_COLORMAP.keys()))
ax.set_xticks(range(len(VOC_COLORMAP.keys())))
ax.set_xticklabels(list(VOC_COLORMAP.keys()), rotation=90)
plt.show()

Tutorial: Connecting collab to a gpu runtime¶

  1. Open a Google Colab notebook like this one: https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb

  2. From Runtime choose Change runtime type and under Hardware Accelerator select GPU. Congrats! You are now ready for GPU-enhanced deep learning!

  3. Test your gpu setup with tensorflow:

     import tensorflow as tf
     tf.config.list_physical_devices()
  1. You can also:
    !nvidia-smi
  1. [Optional] Select your kitties, corgies, and power level from Tools -> Settings -> Miscellaneous

Tutorial: Machine Learning Experiments¶

We've seen that designing a powerful machine learning system requires a combination of intuition, experimentation, and computational resource. How can we systematically track the ML experiments we run? How can we keep an eye on them as they progress?

Let's introduce two new libraries: sacred and tensorboard.

Sacred is an open-source machine learning experimentation framework that allows users to track their experiments and retain their configurations and results.

TensorBoard is a visualisation toolkit that allows users to watch their models training in real-time, and log and visualise training progress.

We may also want to write our own custom components, for example:

  • custom loss functions
  • custom outputs and monitoring
  • custom training loops with multiple models, asynchronous updates, etc., e.g. GANs
  • custom training curricula e.g. pretrain-train then train

Let's see how we can write a custom ML experiment using Sacred and Tensorboard that does all these things.

NB: this tutorial should be run in a local or cloud Jupyter lab environment.

Setting up our directories for successfully tracking experiments:

root                               // repository root for version control
│   README.md                      // repo README with useage instructions
│   cli.py                         // good practise - you want an entrypoint for your project
|   runner.py                      // (optional) abstract your experiment entrypoint away from your CLI
|   conf.yml                       // you can use yaml for human-readable config files
|
└───myproject                      // where the actual code is kept
|   |   __init__.py                // initialise your experiment here
|   |   train.py                   // code containing you training loop
|   |   ... eval.py, loss.py, etc. // you may want to breakout other modules for tidiness
|   |   main.py                    // code for running your ML curriculum 
|   └───models                     // you may want to breakout your model generating scripts
|       |   mymodel.py             // keeps your ML models tidy
|
└───experiments                    // a directory for all your experiment data, gitignored
|   └───sacred                     // experiments subdirectory for sacred experiment files
|   └───tensorboard                // experiments subdirectory for tensorboard experiment files
│
└───data                           // a directory for all your data, gitignored

cli.py -> command line interface, tells users how to use your project and allows cli execution, e.g. python cli.py mycommand --option=value

data/ and experiments/ can also be paths to mounted disks, usually at, e.g., /mnt/data/. Suitable for up-to-large ML projects (~5tb).

Sacred uses Experiments and observers to setup and track ML experiments. Sacred can capture writer outputs, and save files using artifacts

Observers allow you to mirror the same experiment metrics and results to multiple locations: file paths, Mongo DBs, AWS, GCP, and Azure cloud storage.

Explore this repo to see how to use these tools.